Standard Error Considerations on AFM Parameters 


Guillaume Durand 
National Research Council 
Canada 
100 rue des Aboiteaux 

Moncton, NB, Canada 
Guillaume.Durand@nrc.ca 


ABSTRACT 


Knowledge tracing is a fundamental area of educational data 
modeling that aims at gaining a better understanding of the 
learning occurring in tutoring systems. Knowledge tracing 
models fit various parameters on observed student perfor- 
mance and are evaluated through several goodness of fit met- 
rics. Fitted parameter values are of crucial interest in order 
to diagnose learning mastery as well as knowledge models 
and qualitative aspects of the learning environment. Unfor- 
tunately, parameter values are rarely associated with stan- 
dard errors or confidence intervals, both of which are criti- 
cal information to validate the inferences that can be made 
from the model. Taking the example of the Additive Factor 
Model, we describe how to obtain standard errors on the 
model parameters. We propose two methods to compute 
those and discuss results obtained on a public dataset. 


Keywords 


Parameters standard error, Additive Factor Model 


1. INTRODUCTION 


Educational Data Mining (EDM) has already produced nu- 
merous predictive models to accurately detect, anticipate 
and measure meaningful outcomes of learning activities. Pre- 
dicting student performance has been available for years. 
For instance, it was the goal of the Knowledge Discovery 
and Data mining (KDD) Cup 2010 [1], where teams around 
the world competed to get the most accurate predictions 
on student test item successes. While predictive accuracy 
and overall model goodness of fit remain central concerns, 
others considerations have since emerged in the EDM scien- 
tific community. Model usefulness is one of them. A model 
can be accurate in its predictions but useless to provide ad- 
ditional educational values in a learning environment [10]. 
Another concern, of even greater interest for the work pre- 
sented in this paper, is the identifiability of the models pro- 
duced and used by the EDM community. The cognitive 
models we use for knowledge tracing are validated towards 
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their predictive quality but their prediction performance is 
not necessarily where they are most useful. This is the case, 
for instance, for the Additive Factor Model (AFM) [3] or 
the Bayesian Knowledge Tracing model (BKT) [5]. Both are 
widely used in intelligent tutoring systems to detect when a 
student has mastered a skill [15] in order to provide her with 
the next adequate learning material. In this situation, BKT 
is not used only to evaluate the probability that the student 
will give a correct answer at time t. It is also used to check 
whether the “p_known” value calculated on fitted model pa- 
rameters has reached the 0.95 threshold [15]. In that case, 
inferring learning mastery based on fitted parameter values 
is risky when there is uncertainty on the fitted values. First, 
there is a risk that different combinations of parameters may 
yield functionally identical models that explain observations 
in the same way. This is known as the identifiability issue, an 
important problem that keeps being discussed and solved in 
the BKT community [2, 7]. A second issue involves the relia- 
bility and confidence in the fitted parameter values. In other 
words, how sure we are of the fitted parameter value that 
will be used to infer that the learning mastery threshold has 
been reached. That issue has been of primary importance 
in recent usage of AFM to perform advanced learning factor 
analysis in the field [8] or when building tools to tentatively 
offer guidance for building competency frameworks [9]. For 
instance, Durand et al. [9] describe a situation where a skill 
was first fitted as fairly difficult (low 8) with fast learning 
rate (high y). After a small modification of the training 
dataset, the same skill was estimated easy (large 3) with no 
learning (small 7). In addition, it is also known from the 
literature that latent variable models, including skill-based 
cognitive models such as AFM, are difficult to estimate pre- 
cisely [18]. In light of these results, it becomes crucial to 
take a closer look at the uncertainty on model parameters, 
beyond predictive accuracy. Quantifying the uncertainty on 
fitted parameter values by estimating their standard error 
appears necessary in order to increase our ability to make 
correct, and hopefully useful, inference from fitted models. 


The rest of the article is organized as follows. The next 
section presents related works. Section 3 presents the AFM 
model, its use for diagnosing learning, and the computation 
of the standard error on fitted parameter values, using two 
different techniques. Experimental results on several cogni- 
tive models from the PSLC-Datashop [11] are presented in 
Section 4 and discussed in Section 5. We then summarize 
the contributions presented in this paper and their impact 
on future developments. 
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2. RELATED WORK 


A recent and fundamental paper by Philipp et al. [17] inves- 
tigates the estimation of Standard Errors in cognitive diag- 
nostics models. Clearly identifying the need of assessing the 
uncertainty of the estimated model parameters using confi- 
dence intervals, they presented the theoretical background 
for estimating parameter standard errors for the G-DINA 
cognitive diagnostic model [17]. In their explanations, they 
essentially presented and discussed different ways of com- 
puting standard errors by either considering the complete 
or the incomplete information matrix. In their experiments, 
they managed to highlight the necessity of considering the 
complete information matrix rather than using the incom- 
plete one to compute parameters standard error. This re- 
sult, while interesting, was not the only focus of our interest. 
The authors detailed two ways of computing both the com- 
plete and incomplete information matrix in the context of 
G-DINA that were of primary relevance for an application 
to AFM. The first way uses an Outer Product of Gradient 
(OPG) estimator. This estimator has the advantage to be 
relatively easy to implement but slightly less precise than 
the method using the Hessian of the log-likelihood, which 
has the drawback of being more cumbersome to implement. 
In our experiments we used the Hessian estimator of the 
information matrix. 


Computation of the standard error of parameter estimates 
is a classic approach in statistics method and a dense lit- 
erature details its applications. However, it seems to have 
drawn a limited interest in the EDM community so far, as 
we did not find implementation examples in the EDM lit- 
erature. Nevertheless, a connecting point could be found 
in the renewed interest on model identifiability issues [2, 7]. 
Identifiability issues can lead to an information matrix that 
is ill conditioned and that cannot be inverted. As we will see 
later, parameter standard error is obtained by inverting the 
information matrix using OPG or Hessian approaches. If 
the information matrix cannot be inverted, there is no stan- 
dard error that can be obtained by these methods. Philip 
et al. mentioned that such situation can occur in the DINA 
model [6] whenever a “test does not involve a single-attribute 
item for each of the K attributes” [17]. This is a result we 
intuitively implemented in rules when guiding competency 
framework refinement with AFM [9]. Howeverm this intu- 
itive ruleturns out to be a requirement for standard error 
estimation. While BKT identifiability conditions are start- 
ing to be well documented, we have not been able to find 
an equivalent for AFM and we hope that the scientific com- 
munity will address this issue. The main objective of this 
contribution is to present, illustrate, and discuss the imple- 
mentation of AFM parameter standard error estimation. To 
the best of our knowledge, this had not been addressed yet 
in the literature. 


3. THE ADDITIVE FACTOR MODEL 
The AFM [3] models the probability that a student i suc- 
ceeds on an item 7 by a mixed-effect logistic regression: 


K K 


P(¥iz = lax, 8,7) = logit (ai + D> Brajrt >_ Yedyntin) (1) 
k= k=l 


where logit~'(x) = 1/(1+e7”). Parameters ai, 8x and 7x 
represent the proficiency of student 7, easiness of skill k and 


learning rate for skill k, respectively.! The Q-matrix Q = 
[gjrz], also known as the Knowledge Component model in the 
PSLC-Datashop [11], represents the item-to-skills mapping 
by a binary matrix, as in the following example: 


Skill... Skill.2 Skall.3 


ItemA 1 0 0 

_ ItemB 0 1 0 
o= ItemC 1 1 0 ? 

ItemD 0 0 1 


where items A, B and D evaluate one skill each, and item C 
evaluates two. 


Finally, variable t;, is the number of times student 7 has 
practiced skill k, also known as the opportunity number. 
Parameters @ and ¥ are key differentiators for AFM as a 
cognitive diagnostics model [8]. They model the learning 
process for each skill, making AFM a powerful and very 
unique model to finely characterize the acquisition of skills 
[8]. Learning parameters allow to plot useful learning curves 
detailing learning acquisition. 


3.1 Learning curves 

Learning curves are an essential tool to improve learning 
systems. They “give us a measure of the amount of learning 
that is taking place relative to the system’s model” allowing 
to compare and improve them [14]. Concretely, a learning 
curve is a “graph that plots performance on a task versus 
the number of opportunities to practice” [14]. The perfor- 
mance measured can be the time spent assembling an engine 
component in a production line or as it is often the case in 
the educational field, the error rate at applying a set of, or 
individual skills. 


Displaying learning curves in multidimensional learning en- 
vironments can be difficult. Those environments are not 
necessary built for single skills learning measurement and 
they usually combine different set of skills evaluated to- 
gether (multidimensionality). In such situation, we need to 
“retrofit” the analysis and AFM is the perfect model to do 
that as it tries to detect each skill specific (additive) contri- 
bution towards each item success. 


Learning curves when modeling learning performance over 
time follow a “power law of practice” [16] which states per- 
formance over time should increase following a power law. 
In the Intelligent Tutoring Systems (ITSs) context, we can 
expect the error rate to drop as a power law over practice 
opportunities. Comparing ITS or sections of them can be 
done by considering the steepness of the curve. A steeper 
curve indicates a faster acquisitions of the skills practiced 
[14]. 


Another advantage of using AFM to draw learning curves is 
that we can compensate for the attrition bias. Over time, 
fewer learners tend to perform the items because many of 
them have learned the skill and the curves tend to quickly 
degenerate, impacting the value of slopes and the power law 


‘We refer to 8 and ¥ as the skill and learning parameters 
in the rest of the article. 
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Learning Curve for Skill k 
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Figure 1: Example of a error curve for a moderately hard 
skill with a moderately fast learning rate. 


fit. A convenient way to produce a learning curve for skill 
k in AFM is to use Eq. 1 with 6x, yx, and a ”typical” 
value of the student proficiency. Using a; = 0 is convenient, 
and usually roughly corresponds to the average value of the 
estimated a’s. This individual theoretical learning curve for 
skill k is given by: 


1 


BOM) = logit” (Be + Kt) = 1+ exp (—Bx — Yet)” 


(2) 


Typically, we consider error curves while talking about learn- 
ing curves. The error curve is obtained by plotting ECx(t) = 
1— LC; (¢) as illustrated in Figure 1. 


3.2 Computing the Standard Error 

We present two methods to estimate the standard errors 
on parameters. The first one is a classical approach in the 
statistics literature. It involves the computation of the nega- 
tive Hessian of the log-likelihood. The second one is inspired 
by the parametric bootstrap and estimates the standard er- 
ror by computing empirical standard deviations on the pa- 
rameters obtained from simulated observation samples. 


3.2.1 Negative Hessian of the log-likelihood 
Technically, the standard errors of estimated parameters can 
be retrieved from the covariance matrix of the parameters 
(eq. 3). More precisely, they are equal to the square root of 
the diagonal elements in: 


Covla,B,y)= | Vee Ve Ve]. (3) 
Vy,0 Ve Vy 


However, this covariance matrix is not known and we need 
to estimate it in order to compute our standard errors. For- 
tunately, the estimation of covariance matrices have been 
of interests of statisticians for a long time and several ways 
have been proposed to solve it. More precisely, it turns out 
that the covariance matrix is equal to the inverse of the 
information matrix [17], Cov(a, 8,7) = Z(a,6,y)~*. This 
means we can compute estimators of standard deviation on 
parameter estimates as long as we can compute and invert 
the information matrix. At the maximum likelihood, Z is 


given by the negative Hessian matrix of the log-likelihood: 


a7L O7L arL (4) 


ayaa  OdyaB O24 


In our implementation of AFM, we use a penalized version 
of the log-likelihood, as detailed in [8], and adapt Eq. 4 
accordingly. 


3.2.2 Simulation 

Keeping in mind that “a standard error is the standard de- 
viation of the distribution of parameter estimates over mul- 
tiple samples” [20], we simulate multiple samples from the 
initial data, estimate parameters on each samples, and cal- 
culate the empirical standard deviation on these results: 


Algorithm 1: Pseudo-code of the simulated standard error 
estimation function. Values in square brackets are defaults. 


Data: Q-matrix Q, first attempt observations O and a, £8, 
7 parameter values 

Parameters: Penalization parameter X [1], number of 
simulations n [1000] 

Result: std(a), std(3), std(y) 

Compute P(Yi; = lla:, 6, y) according to Eq. 1 for each 

first attempt observation O;;; 

repeat 

Create R, a matrix of P size with random values 

between 0 and 1; 

Create O’ a matrix equal to O; 

for first attempt observation O;; do 

if Rij > P(¥i;) then 
| R O17; —— 0; 


Estimate a, 8, y for each simulation iteration with 
respect to Q and O’; 

until n simulation iterations; 

std(a) <— Standard deviation of n simulation estimated a; 

std(G) <— Standard deviation of n simulation estimated 6; 

std(y) <— Standard deviation of n simulation estimated 7; 


This simulation approach aimed at providing us with an al- 
ternative method to validate the Hessian’s detailed in previ- 
ous section but also to provide us with an alternative should 
inverting the Hessian matrix would be impossible or too 
cumbersome to implement outside of our experimental envi- 
ronment. The simulation takes as input a Q-matrix and per- 
formance observations. It fits the AFM parameters before 
computing a prediction for each observation. If the predic- 
tion is below a random value uniformly distributed between 
0 and 1 then the observation is changed to a failure. Then we 
iterate again by computing new values of AFM parameters 
on the new observations dataset, computing the predictions 
and creating another observations sample. The pseudo-code 
of this simulation process is presented in Algorithm 1. 


We also tried another estimation method using a Jackknife 
approach (iterative leave-one-out on students) that provided 
us with overly optimistic values. Standard errors were clearly 
underestimated in the PSLC dataset we experimented. 
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Table 1: Overall predictive quality of KC models as com- 
puted by PSLC-Datashop 


Model Name | KCs | #Obs. | AIC | BIC | RMSE 

Arith0 18 5,104 4,948 | 5,569 | .397095 
Context 12 5,104 5,030 | 5,573 | .399431 
Original 15 5,104 5,180 | 5,762 | .407192 


4. EXPERIMENTS 
4.1 Dataset 


In our experiments, we used the “Geometry Area (1996-97)” 
dataset from DataShop [11]. It contains 6778 observations 
of the performance of 59 students completing 139 unique 
items from the “area unit” of the Geometry Cognitive Tutor 
course (school year 1996-1997). This is a classic Datashop 
collection, associated with many prior publications [3, 4, 12, 
13]. We selected three Knowledge components (KCs) models 
to run our experiments: 


e hLFASearchAICWholeModel3arith0 (Arith0O henceforth); 


e hLFASearchModell-context (Context hereafter); 


e Original. 


They were selected for their reasonable numbers of skills and 
observations but also because they have distinctive goodness 
of fit metrics allowing to differentiate their predictive qual- 
ities. Characteristics of these KC models, as reported in 
Datashop are presented in Table 1. This suggests that the 
best predictive model would be ArithO, followed by Context 
and Original. The number of skills (KCs) do not seem to 
correlate with the goodness of fit for these models. 


4.2 Method 


Our implementations are done using Matlab and Octave.” 
The AFM estimation used in previous work|8, 9], was ex- 
tended with the developments described above. The Hessian 
of the log-likelihood was computed using an off the shelf nu- 
merical method using a central difference approximation.” 
This has the advantage of requiring no calculus for comput- 
ing second derivatives, but has the disadvantage of being 
notably slower than direct Hessian computation. The full 
Hessian computation takes around three hours on a regular 
laptop, for each of the KC models. The simulation-based es- 
timates were obtained using a Go language implementation 
of AFM parameter estimation. It takes less than 15 minutes 
in Go to compute 1000 simulation iterations. 


4.3 Results 


Table 5 shows the estimated values and standard errors for 
learning parameters 8 and y for KC models Arith0, Context 
and Original. At first glance, we can see that none of the 
parameters take large values compared to the others. This 
suggests that the KC models are of excellent quality. Over- 
all inter-model differences in parameter values and standard 
errors are also relatively small. 


2 Octave /Matlab implementations are available on request. 
3Octave Optim package, numhessian function. 


Table 2: Mean parameter values 


KC Model Mean parameter values 

a B x 
Aritho 0(.639) | .367(1.261) | .199(.269) 
Context 0(.647) | .205(1.323) | .185(.327) 
Original 0(.624) | .308(.877) | .147(.127) 


Table 3: Mean standard Errors computed with the Hessian 


KC Model Mean standard errors 

a B oy 
Aritho .366(.149) | .349(.137) | .083(.075) 
Context .364(.149) | .320(.175) | .073(.093) 
Original .361(.149) | .284(.073) | .051(.038) 


Mean parameter values (across models) in Table 2 show that 
all models share the same (at .001 precision) mean and al- 
most identical standard deviations of a. This suggest that 
changing the KC model had a limited impact on students’ 
proficiencies. In other words, students proficiencies remain 
consistently estimated from one model to another. It seems 
unlikely that a student proficiency would drastically change 
from one model to another. Interestingly the mean values of 
y are higher in the better models but the standard deviation 
also increases suggesting higher values with more variance. 
If we look at the mean standard errors in Table 3, we notice 
that it is very similar between models for a, suggesting again 
a limited impact of the KC models on students proficien- 
cies. However the values obtained for learning parameters 
are very interesting as the mean standard errors increase 
with the predictive quality of the models. One would have 
excepted the opposite to happen as Arith0 is expected to 
have a better fit of the observations than Original. In ad- 
dition, standard deviations on the errors are also higher for 
Arith0 than Original. One assumption could be that ArithO 
managed to get few better curves with more bad ones and 
less average good ones. More investigation would be neces- 
sary to clarify this point. 


5. DISCUSSION 
5.1 Model goodness of fit 


The dataset used in this experiment is very adapted to con- 
duct learning factor analysis and it is advertised as a good 
one to showcase PSLC-Datashop features. Consequently the 
discrepancy obtained between goodness of fit and mean stan- 
dard error may not generalize to other situations. In addi- 
tion, we have little knowledge of the intention that led to 
the design of these KC models. Those cautionary consider- 
ations made, we still have been able to characterize a situa- 
tion were an overall better model does not necessarily lead 
to a a more reliable KC model. This is an interesting re- 
sult, for instance, if we want to automatically refine models 
as in learning factor analysis as it would imply to not only 
look at model goodness of fit but also KC model goodness of 
fit. Standard errors can also inform us on the problematic 
skills to modify as it allow us to get a better grasp on the 
reliability of learning parameters for each skill. 


5.2 Learning detection 
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LCs in 95% Cl for ArithO Geometry 
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Figure 2: A skill with a flat curve suggesting limited learning 
for most values in the 95% confidence interval 
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Figure 3: A skill with a steep curve clearly showing learning 
for all values in its 95% confidence interval 


Standard errors allow us to compute confidence intervals 
on parameters and learning curves. Figures 2 and 3 plot 
learning curves for a skill with low difficulty and no learning 
(Fig. 2) and a difficult skill with fast learning rate (Fig. 
3). In both cases, the ”Fitted” learning curve uses fitted 
learning parameters, the ”Upper” curve is obtained using 
the parameters at the lower end of the confidence interval 
(1.96 x StdErr below fitted values), and the ”Lower” curve 
uses parameters are the top of the C.I. (1.96 x StdErr above 
fitted). The Upper and Lower curves provide us with the 
extreme slopes that the learning curve can take in a 95% 
confidence interval, and show the range of difficulty the skill 
can take while still remaining in the confidence interval. 


Some values taken by these curves are not possible in prac- 
tice. For instance in Figure 2, the Upper curve is impossible 
under AFM parameter fitting constraints, as 7 is constrained 
to be positive. On the other hand, the Lower curve can be 
observed and shows limited learning. In this configuration of 
learning parameters, stating no learning after looking only 
at the Fitted learning curve could be an overstatement even 


Table 4: RMSE and r? computed between the Hessian and 
the simulation standard errors 


RMSE ee 
KC Model | a B 7 a B y 
Aritho .052 | .050 | .022 | .906 | .963 | .987 


Context .053 | .061 | .020 | .890 | .900 | .973 
Original .047 | .026 | .004 | .917 | .947 | .992 


though it is very likely that no learning is occurring. How- 
ever as Murray et al. [15] showed, flat aggregated curves 
showing no learning could, in fact, hide the learning occur- 
ring for sub-group of students. In their study of an algebra 
curriculum containing performance data of 15,414 students 
on 881 skills, they discovered that around 16% of skills were 
misidentified as showing no learning. Standard error compu- 
tation gives another reason why we should be cautious when 
claiming no learning. But can standard errors help us claim 
learning? The skill in Figure 3 answers this question. We 
can see that all the difficulties and slopes that can be taken 
in the 95% confidence interval leads to conclude that this 
skill is learned. In conclusion to this subsection, consider- 
ing fitted parameter standard errors is important to confirm 
that learning is occurring but not necessarily the opposite. 


5.3. Simulation and Hessian methods 

Table 5 shows that standard errors computed from the log- 
likelihood Hessian and by simulation are very close. This 
means that our method can potentially provide an estimate 
of the standard errors when the Hessian is hard to com- 
pute or invert. This also confirms the validity of our sim- 
ulation results. Table 4 shows the Root Mean Square Er- 
ror (RMSE) and correlation (r*) between simulation esti- 
mates and the standard errors over all parameters of each 
KC Model. Although not insignificant, the difference be- 
tween the two methods is sufficiently small, and the value of 
r® large enough, to consider that simulation results provide 
good estimates of the standard errors on parameters. 


6. CONCLUSION AND FUTURE WORK 


Estimating the reliability of parameter estimates is a crucial 
aspect of model inference. We showed how to compute stan- 
dard errors on AFM model parameters, and applied the pro- 
posed methods to public datasets from the PSLC Datashop. 
This yields several observations. 


First, the more accurate model is not always the one with 
the better KC model: parameter validity and predictive abil- 
ity are different. That confusion is not new however and al- 
lowed progress in cognitive psychology in the first half of the 
nineteenth century before the community realized it failed 
to “provide a strong foundation for deducing likely relation- 
ships among variables, and hence for the development of 
generative theory”[19]. 


Second, standard errors, and the associated confidence inter- 
vals, provide precious insight into learning. However, char- 
acterizing the absence of learning is more complicated, es- 
pecially when 7 is less reliable. 


Finally, standard errors on parameters can be easily esti- 
mated by the simulation method we describe. This can be 
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Table 5: Estimated parameters and standard errors for several PSLC models. 


Model Skill B StErr@ | Simul. y StErr y | Simul. 
Arith0 Geometry*parallelogram-area 1.939 0.233 0.224 | 0.028 0.016 0.016 
Aritho Geometry*parallelogram-area*Textbk_New_Decomp... 2.540 0.617 0.659 | 0.180 0.149 0.192 
Aritho Geometry*Textbk_New_Decompose-circle-area 1.136 0.374 0.399 | 0.183 0.093 0.111 
Arith0 arithmetic 1.992 0.272 0.250 | 0.027 | 0.023 0.022 
Arith0 Geometry 0.781 0.260 0.197 | 0.000 | 0.036 0.021 
Aritho Geometry*decomp-trap*trapezoid-area -0.624 | 0.200 0.202 | 0.092 | 0.017 0.017 
Arith0 Geometry* ALT: TRIANGLE-AREA 1.501 0.341 0.260 | 0.000 | 0.056 0.035 
Arith0 Geometry* ALT: TRIANGLE-AREA-PART 0.204 | 0.400 0.416 | 0.230 | 0.124 0.132 
Arith0 Geometry*compose-by-multiplication -0.675 | 0.390 0.400 | 0.267 0.121 0.126 
Arith0 Geometry*pentagon-area -0.550 | 0.199 0.200 | 0.110 0.015 0.016 
Arith0 Geometry* ALT:CIRCLE-AREA-INDIRECT -0.268 | 0.305 0.306 | 0.312 | 0.066 0.071 
Aritho Geometry*Textbk_New_Decompose-circle-area*circle... | 0.871 0.255 0.258 | 0.073 0.030 0.031 
Arith0 Geometry* ALT:CIRCLE-AREA 0.973 0.280 0.281 | 0.124 | 0.039 0.042 
Arith0 Geometry*circle-area -0.393 | 0.348 0.342 | 0.171 0.089 0.093 
Aritho Geometry*circle-diam-from-subgoal 0.126 0.275 0.268 | 0.071 0.045 0.043 
Arith0 Geometry*equi-tri-height? -2.986 | 0.714 0.888 | 1.232 0.310 0.385 
Aritho Geometry*decomp-trap -0.555 | 0.304 0.304 | 0.146 0.057 0.060 
Aritho compose-subtract 0.588 0.524 0.540 | 0.329 0.200 0.222 
Context | parallelogram-area 2.105 0.234 0.227 | 0.019 0.012 0.012 
Context | context 0.105 0.168 0.117 | 0.000 | 0.005 0.002 
Context | Geometry 0.873 0.168 0.171 | 0.016 | 0.005 0.006 
Context | Subtract-rectangles 2.475 0.571 0.398 | 0.000 0.137 0.091 
Context | decomp-trap -0.529 | 0.181 0.184 | 0.060 0.012 0.012 
Context | compose-by-multiplication 0.284 0.248 0.245 | 0.114 0.023 0.023 
Context | pentagon-area -0.552 | 0.199 0.197 | 0.110 0.015 0.016 
Context | circle-area 0.393 0.212 0.217 | 0.106 0.019 0.020 
Context | radius-from-area -0.427 | 0.351 0.347 | 0.165 0.089 0.091 
Context | radius-from-circumference 0.134 0.275 0.269 | 0.067 0.045 0.044 
Context | equ-tri-height-from-base/side -2.972 | 0.713 0.819 | 1.230 | 0.310 0.354 
Context | Subtract 0.576 0.523 0.554 | 0.336 | 0.200 0.227 
Original | ALT:PARALLELOGRAM-AREA 2.326 0.250 0.197 | 0.011 0.016 0.013 
Original | ALT:PARALLELOGRAM-SIDE 1.054 | 0.494 0.473 | 0.345 | 0.152 0.157 
Original | ALT: COMPOSE-BY-ADDITION 1.035 0.191 0.135 | 0.000 | 0.012 0.008 
Original | ALT: TRAPEZOID-AREA -0.860 | 0.344 0.340 | 0.344 | 0.092 0.094 
Original | ALT: TRAPEZOID-HEIGHT -0.800 | 0.329 0.340 | 0.243 | 0.079 0.083 
Original | ALT: TRAPEZOID-BASE -0.498 | 0.334 0.334 | 0.233 | 0.084 0.085 
Original | ALT: TRIANGLE-AREA 0.964 | 0.249 0.237 | 0.042 | 0.028 0.027 
Original | ALT: TRIANGLE-SIDE 0.122 0.297 0.245 | 0.037 | 0.056 0.044 
Original | ALT:COMPOSE-BY-MULTIPLICATION 0.393 0.231 0.221 | 0.113 | 0.022 0.023 
Original | ALT: PENTAGON-AREA -1.000 | 0.334 0.327 | 0.392 | 0.081 0.083 
Original | ALT: PENTAGON-SIDE -0.413 | 0.235 0.226 | 0.151 0.028 0.029 
Original | ALT:CIRCLE-RADIUS 0.360 0.234 0.210 | 0.046 | 0.027 0.026 
Original | ALT:CIRCLE-AREA 0.473 0.209 0.197 | 0.104 | 0.019 0.020 
Original | ALT:CIRCLE-CIRCUMFERENCE 0.876 0.268 0.251 | 0.073 | 0.037 0.037 
Original | ALT:CIRCLE-DIAMETER 0.593 0.258 0.252 | 0.074 | 0.034 0.036 
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